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ABSTRACT 


Major  processor  manufacturers  have  embraced  the  high-level  synthesis  (HLS)  design  phi¬ 
losophy.  HLS  offers  the  potential  to  explore  the  design  space  of  electronic  circuits  and 
systems  more  efficiently  than  traditional  methods.  In  this  thesis,  we  investigate  the  appli¬ 
cation  of  HLS  to  hardware-oriented  security  and  trust  by  developing  a  model  of  a  simple 
16-bit  Central  Processing  Unit  in  the  SystemC  modeling  language.  We  enhanced  our  pro¬ 
cessor  with  a  simple  security  mechanism  that  enforces  a  memory  integrity  policy.  The  in¬ 
tegrity  policy  allows  a  region  of  the  program  labeled  as  trustworthy  to  modify  any  address 
in  data  memory,  but  another  region  of  the  program  labeled  as  untrustworthy  is  restricted 
to  only  being  able  to  modify  a  specific  region  of  data  memory.  Our  timing  results  show 
that  adding  the  integrity  policy  enforcement  mechanism  has  a  negligible  effect  on  overall 
system  performance. 
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Executive  Summary 


Major  processor  manufacturers  have  embraced  the  high-level  synthesis  (HLS)  design  phi¬ 
losophy.  For  example,  Xilinx  has  incorporated  HLS  into  its  Vivado  suite  of  Electronic 
Design  Automation  (EDA)  tools.  HES  offers  the  potential  to  explore  the  design  space  of 
electronic  circuits  and  systems  more  efficiently  than  traditional  methods.  The  HES  design 
process  begins  with  a  functional  model  that  is  iteratively  refined  to  progressively  finer  lev¬ 
els  of  detail,  eventually  resulting  in  a  cycle- accurate  model  of  the  system.  In  this  thesis 
we  investigate  the  application  of  HES  to  hardware-oriented  security  and  trust  (HOST)  by 
developing  a  model  of  a  simple  16-bit  CPU  in  the  SystemC  modeling  language.  Using  Sys- 
temC,  designers  can  express  both  hardware  and  software  constructs  in  C-t-i-;  therefore,  the 
hardware  and  software  of  an  embedded  system  can  be  simulated  in  the  same  environment, 
rather  than  using  separate  hardware  and  software  simulators.  Only  a  C-t-i-  compiler  and  the 
SystemC  library  are  needed  to  design  and  simulate  a  circuit. 

Our  processor  is  based  on  the  design  from  the  “Nand  to  Tetris”  course  that  teaches  computer 
science  concepts  across  all  levels  of  system  abstraction  by  constructing  a  general-purpose 
computer  system  from  the  ground  up,  starting  with  digital  logic  gates  and  then  progressing 
to  a  16-bit  CPU  architecture,  assembler,  computer,  and  high-level  language  programming. 
To  demonstrate  the  applicability  of  the  HES  design  approach  to  hardware-oriented  security 
and  trust,  we  enhanced  our  processor  with  a  simple  security  mechanism  that  enforces  a 
memory  integrity  policy.  The  integrity  policy  allows  a  region  of  the  program  labeled  as 
trustworthy  to  modify  any  address  in  memory,  but  another  region  labeled  as  untrustworthy 
is  restricted  to  only  being  able  to  modify  a  specific  region  of  memory.  Our  timing  results 
show  that  adding  the  integrity  policy  enforcement  mechanism  has  a  negligible  effect  on 
overall  system  performance.  HES  has  the  potential  to  help  designers  of  security  enhance¬ 
ments  as  well  as  designers  of  the  systems  themselves,  and  a  SystemC  approach  has  the 
potential  to  make  hardware  design  more  accessible  to  computer  scientists.  Euture  work 
will  involve  exploring  a  wider  variety  of  programs,  policies,  policy  enforcement  mecha¬ 
nisms,  and  processors,  as  well  as  increasing  the  memory  size,  as  cycle-accurate  modeling 
of  a  large  number  of  memory  cells  requires  a  very  large  RAM  overhead,  which  is  a  known 
challenge  with  SystemC  modeling. 
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CHAPTER  1: 

Introduction  and  Motivation 


Trustworthy  system  development  is  a  major  concern  for  the  Department  of  Defense,  which 
operates  a  large  variety  of  complex  systems  that  must  be  resilient  to  a  wide  array  of  de¬ 
velopmental  and  operational  attacks.  Developing  trustworthy  systems  is  expensive  due  to 
the  high  non-recurring  engineering  (NRE)  costs  of  developing  hardware  and  software,  and 
the  small  customer  base  over  which  to  amortize  that  high  NRE  cost.  In  addition,  system 
designers  are  increasingly  concerned  with  the  security  of  the  entire  supply  chain,  including 
hardware  [1],  [2]  and  design  tools  [3].  Compromised  hardware  has  the  potential  to  under¬ 
mine  policy  enforcement  mechanisms  implemented  in  software.  Addressing  the  security 
problem  of  the  supply  chain  of  electronics  is  very  challenging  due  to  the  large  number 
of  vendors  of  electronic  components  and  intellectual  property  (IP).  Building  hardware  in 
a  trusted  foundry  is  one  approach  to  addressing  these  issues,  but  building  custom  hard¬ 
ware  in  this  way  is  costly.  In  addition  to  the  fabrication  costs,  which  increase  according  to 
Rock’s  law  in  super  linear  fashion,  the  engineering  costs  are  also  very  high  [4],  [5].  Even 
using  reconfigurable  hardware,  such  as  field-programmable  gate  arrays  (EPGAs),  does  not 
necessarily  reduce  the  design  cost  although  it  may  reduce  the  fabrication  cost. 

The  goal  of  this  thesis  is  to  reduce  the  cost  and  time  for  developing  trustworthy  hardware 
by  leveraging  high-level  synthesis  (HES)  to  efficiently  explore  the  design  space  in  order  to 
determine  which  design  point  optimally  balances  the  tradeoffs  of  concern  for  the  customer. 
HES  allows  for  the  development  of  functional  models  at  a  high  level  of  abstraction  that  can 
be  quickly  implemented  in  software  [6].  Eurther  refinement  of  a  functional  model  results 
in  a  transactional  model,  and  further  refinement  of  a  transactional  model  results  in  a  timing 
model.  Einally,  the  cycle- accurate  model  is  the  lowest  level  of  abstraction  and  the  finest 
level  of  granularity. 

Development  and  simulation  of  a  complete  cycle- accurate  model  is  too  expensive  for  all 
points  in  the  design  space.  Therefore,  the  HES  methodology  relies  on  quickly  building 
coarse-grained  models  (e.g.,  the  functional  models)  to  quickly  determine  important  metrics 
such  as  power  and  performance  at  a  coarse  level  of  granularity.  By  allowing  the  designer  to 
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evaluate  tradeoffs  effieiently  at  a  eoarse  level  of  granularity,  HLS  enables  the  design  proeess 
to  be  more  effieient  than  traditional  methods.  The  designer  ean  make  important  deeisions 
at  this  stage  before  embarking  on  the  tedious  efforts  and  expensive  eosts  of  refining  the 
eoarse-grained  design  down  to  a  eyele-aeeurate,  fine-grained  model.  The  beauty  of  this 
approaeh  is  that  onee  the  optimal  point  within  the  design  spaee  is  determined,  the  high- 
level  model  ean  immediately  be  utilized,  and  the  proeess  of  refinement  ean  begin. 

A  major  language  for  system  modeling  is  SystemC,  whieh  is  based  on  the  C-t-i-  program¬ 
ming  language.  SystemC  is  very  simple  and  eonsists  of  a  C-t-i-  library  that  ean  be  readily 
downloaded  for  free.  SystemC  allows  a  designer  to  express  a  funetional  model  in  a  mod¬ 
ified  C++  language.  In  addition  to  expressing  software,  SystemC  provides  the  advantage 
of  being  able  to  design  hardware  in  this  language.  This  is  an  improvement  over  traditional 
teehniques  in  whieh  software  is  designed  in  a  traditional  programming  language,  and  hard¬ 
ware  is  designed  in  a  traditional  hardware  deseription  language,  or  HDL.  The  problem  with 
the  traditional  approaeh  is  that  the  hardware  is  simulated  in  a  hardware  simulator,  while  the 
software  is  simulated  in  a  software  simulation  environment.  Having  separate  simulation 
environments  for  hardware  and  software  is  ineffieient  and  inhibits  the  ability  to  eo-design 
the  hardware  and  software. 

We  are  not  the  first  to  apply  HLS  to  hardware  trust.  Bathen  and  Dutt  developed  PoliMakE, 
whieh  uses  HLS  to  explore  polieies  for  multi-eore  proeessors  [7].  PoilMakE  is  built  on  top 
of  their  SystemC  simulation  engine.  SystemC  is  also  used  in  PHiLOSoftware,  whieh  helps 
engineers  design  trustworthy  systems  based  on  multi-eore  proeessors  [8]. 

Coneems  about  information  seeurity  for  modern  eomputers  have  existed  nearly  sinee  their 
ineeption.  With  the  widespread  use  of  modern  eomputers,  the  need  to  provide  information 
seeurity  beeame  more  evident  [9].  Saltzer  and  Sehroeder  foeus  on  safeguarding  information 
for  systems  with  multiple  users  on  the  same  system  [9].  As  malieious  software  emerged, 
ineluding  eomputer  viruses,  worms,  and  Trojans,  patehes  and  firewalls  were  developed  as 
eountermeasures.  While  malieious  software  poses  a  tremendous  ehallenge,  reeognition 
of  the  problem  of  malieious  hardware  has  emerged,  as  the  integrated  eireuit  supply  ehain 
is  world-wide.  The  potential  for  hardware  breaehes  has  inereased  as  global  eonsumption 
relies  more  heavily  on  outsoureed  equipment  [2] .  This  thesis  will  assess  and  demonstrate 
how  high-level  synthesis  (HLS)  ean  faeilitate  the  design  of  poliey  enforeement  eireuitry. 
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CHAPTER  2: 
Related  Work 


The  fields  of  computer  security  and  computer  hacking  have  evolved  over  time.  Just  as 
programmers  work  to  protect  system  security,  skilled  and  motivated  hackers  will  attempt 
to  exploit  weaknesses  in  the  protection  mechanisms. 

Multiple  publications  touch  on  the  matter  of  safeguarding  computer  systems.  One  of  the 
most  seminal  of  these  works  is  J.  H.  Saltzer  and  Michael  D.  Schroeder’s  “The  Protection  of 
Information  in  Computer  Systems,”  written  in  1975  [9].  The  authors  of  this  work  discuss 
how  the  invention  of  the  Von  Neumann  general-purpose  architecture  drastically  reduced 
the  production  cost  of  modern  computers,  which  allowed  wide  spread  use  of  the  machines. 
Saltzer  and  Schroeder  wrote  their  work  with  the  “key  concern”  of  safeguarding  information 
against  multiple  users  on  the  same  system.  As  computer  users  and  designers  get  savvier  in 
their  attempts  to  prevent  software  security  breaches,  malicious  hackers  must  find  new  areas 
to  attack,  where  advanced  security  has  not  yet  been  implemented.  A  more  recent  pub¬ 
lication  entitled  “Trustworthy  Hardware:  Identifying  and  Classifying  Hardware  Trojans” 
addresses  the  possibility  of  security  threats  introduced  at  the  hardware  level  of  modern 
computing.  The  potential  for  hardware  breaches  has  increased  as  global  consumption  re¬ 
lies  more  heavily  on  outsourced  equipment  [2]. 

Karri  et  al.  survey  the  emerging  discipline  of  hardware  oriented  security  and  trust  [2].  In 
a  world  in  which  hackers  develop  sophisticated  exploits,  they  devise  novel  ways  to  bypass 
security  mechanisms.  Hardware  vulnerabilities  represent  a  means  for  such  an  exploit  to 
occur.  Karri  et  al.  present  a  taxonomy  of  malicious  circuitry  for  classifying  malicious 
inclusions  and  countermeasures. 

The  Department  of  Defense  (DOD),  like  the  rest  of  the  information  technology  world, 
finds  itself  reliant  on  global  outsourcing  for  the  manufacturing  of  digital  infrastructure. 
Therefore,  the  DOD  is  interested  in  mitigating  supply  chain  threats  by  using  enhanced 
government  services,  encouraging  improved  commercial  practices,  and  requiring  supply 
chain  risk  management  [10].  While  the  NS  A  has  established  dozens  of  trusted  foundries. 
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manufacturing  all  military  electronics  in  trusted  foundries  may  not  be  feasible.  Design 
and  manufacture  of  all  hardware  and  software  intellectual  property  in  house  is  expensive, 
time  consuming,  and  might  not  yield  a  defect-free  result  [2].  The  vulnerability  exists  for 
untrusted  foundries  to  maliciously  modify  a  circuit  without  user  knowledge. 

A  three-year  investigation  conducted  by  the  Federal  Bureau  of  Investigation  (FBI),  from 
2004-2006,  discovered  “counterfeit  Cisco  routers  in  U.S.  defense,  finance,  and  university 
networks”  [II].  Even  more  alarming  than  the  actual  discovery  of  the  compromised  hard¬ 
ware  was  the  fact  that  many  of  the  routers  came  directly  from  “untrustworthy  sources  in 
foreign  countries.”  While  the  investigation  did  not  detect  malicious  hardware  injections 
and  concluded  that  the  infiltrator’s  motivations  appeared  merely  fiscal,  the  investigation 
“vividly  illustrate[d]  the  vulnerability”  users  face  when  purchasing  unverified  hardware  [2, 
p.  39]. 

Karri  et  al.  suggest  the  creation  of  a  “hardware  Trojan  taxonomy”  to  address  the  possi¬ 
ble  introduction  of  malicious  circuitry  made  in  “untrusted  factories”  [2].  They  state,  “To 
be  trustworthy,  hardware  must  exhibit  only  the  functionality  for  which  it  was  designed, 
nothing  more  and  nothing  less;  conceal  any  information  about  the  computation  performed 
through  any  side  channels  such  as  power  and  delay;  and  be  transparent  only  to  the  designer 
while  remaining  opaque  to  others,  who  should  know  nothing  about  its  design  and  internal 
states”  [2].  They  also  provide  a  more  detailed  definition  of  a  hardware  Trojan  as  “a  mali¬ 
cious  and  deliberately  stealthy  modification  made  to  an  electronic  device  such  as  an  IC.  It 
can  change  the  chip’s  functionality  and  thereby  undermine  trust  in  the  systems  using  that 
trojaned  chip”  [2]. 

The  hardware  Trojan  taxonomy  created  by  Karri  et  al.  is  based  on  five  different  categories: 
the  insertion  phase,  abstraction  level,  activation  mechanism,  effects,  and  location  [2].  The 
insertion  phase  covers  all  possible  points  at  which  a  hacker  could  maliciously  alter  hard¬ 
ware  and  remain  undetected  throughout  the  testing  cycle.  The  activation  mechanism  phase 
deals  with  potential  points  at  which  a  hardware  Trojan  could  be  activated.  For  example, 
the  hardware  Trojan  could  be  either  “always  on”  or  “triggered”  by  an  external  event.  The 
effects  category  addresses  four  generalized  results  the  Trojan  could  create;  it  could  “change 
the  functionality,  downgrade  performance,  leak  information,  [or]  den[y]  service”  [2].  With 
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the  hardware  Trojan  taxonomy  established,  the  real  work-identifying  and  preventing  hard¬ 
ware  Trojan  implementation-ean  begin. 

A  follow-on  artiele  entitled  “Trustworthy  Hardware:  Trojan  Deteetion  and  Design-for- 
Trust  Challenges”  delves  even  further  into  installing  safeguards  against  potential  malicious 
hardware  elements  [1].  The  reliance  of  globalization  and  outsourcing  of  the  semiconductor 
industry  is  again  cited  for  creating  the  increased  vulnerability  of  hardware  Trojans  [1].  To 
offset  this  challenge,  the  authors  suggest  “either  the  developer  must  make  the  IC  design  and 
fabrication  process  trustworthy  or  the  client  must  verify  the  IC  for  trustworthiness”  [1]. 

The  trust  element  of  chip  manufacturing  is  a  primary  concern  for  both  the  Department  of 
Defense  and  the  general  consumer  market  as  a  whole.  In  addition  to  trustworthiness  of  de¬ 
sign,  cost,  performance,  and  functionality  are  important  considerations  in  creating  a  “win¬ 
ning  design.”  Kurt  Keutzer  states,  “The  overall  goal  of  electronic  embedded  system  design 
is  to  balance  production  cost  with  development  time  and  cost  in  view  of  performance  and 
functionality  considerations.  Manufacturing  cost  depends  mainly  on  the  hardware  (HW) 
components  of  the  product”  [4]. 

Our  work  explores  the  potential  benefits  of  HLS  for  trustworthy  system  development.  Ma¬ 
jor  chip  manufacturers  such  as  Xilinx  and  Intel  have  adopted  HLS  in  their  design  practices, 
and  we  argue  that  HLS  can  also  facilitate  the  design  of  policy  enforcement  mechanisms. 
To  validate  our  hypothesis,  we  construct  a  general-purpose  computer  in  the  SystemC  lan¬ 
guage  and  enhance  it  with  a  simple  memory  integrity  policy  enforcement  mechanism.  We 
then  evaluate  system  performance  both  with  and  without  the  security  enhancement.  While 
our  work  falls  within  the  scope  of  hardware-oriented  security  and  trust,  we  do  not  claim  to 
directly  address  the  problem  of  malicious  hardware  inclusions  in  our  work. 
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CHAPTER  3: 
Design  Flow 


In  our  efforts  to  demonstrate  the  capability  of  utilizing  HLS  in  the  design  of  pol¬ 
icy  enforcement  circuity,  we  chose  to  follow  the  coursework  of  Nisan  and  Schocken 
[12].  The  Nisan  and  Schocken  text  is  complemented  with  online  material  found  at 
http://www.nand2tetris.org  [12].  The  course  is  often  simply  referred  to  as  "nand2tetris." 
The  premise  of  their  work  is  to  help  both  individuals  with  and  without  computer  science 
backgrounds  to  comprehend  the  process  of  building  a  modern  computing  machine,  as  well 
as  to  implement  software  to  run  on  the  machine.  Our  project  focuses  on  the  first  portion  of 
their  text  and  coursework — the  construction  of  the  modem  computer. 

The  Nisan  and  Schocken  course  [12]  incrementally  builds  upon  logic  gates  to  construct 
a  16-bit  modern  computer — named  HACK.  Modern  computers  are  built  with  transistors 
that  physically  implement  simple  Boolean  functions,  called  logic  gates.  One  of  the  most 
fundamental  logic  gates,  described  in  more  detail  in  the  section  below,  is  the  NAND  gate. 
Logically,  the  NAND  gate  performs  two  functions.  First,  it  takes  two  inputs  and  performs 
the  AND  function  on  them.  The  truth  table  produced  from  this  function  is  shown  as  Table 
3. 1 .  The  next  logic  operation  is  the  NOT  function,  which  yields  the  inversion  of  the  original 
input.  The  truth  table  for  the  NOT  function  is  provided  in  Table  3.2.  Combining  both  of 
these  functions  together  produces  the  results  shown  in  Table  3.3.  As  explained  below, 
NAND  gates  are  universal  building  blocks  for  combinational  circuitry. 


a 

b 

out 

0 

0 

0 

0 

1 

0 

1 

0 

0 

1 

1 

1 

Table  3.1:  The  truth  table  for  the  AND  function  of  discrete  math. 


The  HACK  computer  is  capable  of  performing  simple  16-bit  operations.  Ultimately,  there 
are  28  functions  the  ALU  can  compute  (a  complete  list  of  the  28  computations  is  found 
on  page  67  of  Nisan  and  Shocken’s  book).  This  chapter  follows  the  development  of  each 
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IN 

OUT 

0 

I 

1 

0 

Table  3.2:  The  NOT  function's  truth  table. 


A 

B 

(AB)’ 

0 

0 

I 

0 

I 

I 

I 

0 

I 

I 

I 

0 

Table  3.3:  The  NAND  gate's  truth  table. 


logic  gate  in  the  manner  deseribed  by  Nisan  and  Shocken,  and  deseribes  how  it  relates 
to  the  overall  development  of  the  HACK  eomputer.  By  the  end  of  this  chapter,  all  logic 
gates  needed  to  eonstruct  the  HACK  machine  have  been  ereated  and  eonnected  to  yield  the 
HACK  maehine  in  SystemC. 

3.1  “In  the  beginning,  there  was  NAND...” 

Beeause  of  their  universality,  NAND  gates  are  fundamental  building  bloeks  of  all  combi¬ 
national  circuits.  Electrical  engineers  ean  easily  build  NAND  gates  out  of  only  a  handful 
of  transistors.  All  other  logical  gates  ean  be  constructed  using  NAND  gates. 

We  created  the  NAND  gate  in  SystemC  by  deelaring  two  boolean  inputs  A  and  B.  The 
NAND  gate  logically  "ands"  the  two  boolean  inputs  then  negates  that  result,  which  pro¬ 
duces  the  output.  In  the  SystemC  eode  used  for  this  projeet,  F  represents  the  final  boolean 
output  of  the  NAND  gate. 

To  properly  construct  and  run  the  NAND  gate  in  SystemC,  the  boolean  inputs  A  and  B 
are  logieally  "anded"  together,  and  then  their  result  is  "notted"  via  the  function" do _nand2.’ 
Figure  3.1  illustrates  the  eomposition  of  the  NAND  gate. 
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A 


p - F 

®-| _ y 

NAND  gate 

Figure  3.1:  The  "NAND"  gate  is  a  fundamental  building  block  for  digital  logic  designs. 

The  following  is  the  actual  SystemC  code  for  the  NAND  gate: 

SC_M0DULE(nand2)  //  declare  nand2  sc_inodule 

{ 

sc_in<bool>  A,  B;  //  input  signal  ports 

sc_out<bool>  F;  //  output  signal  ports 

void  do_nand2()  //  a  C++  function 

{ 

F.writeC  !(A.read()  &&  B.readO)  ); 

} 

SC_CT0R(nand2)  //  constructor  for  nand2 

{ 

SC_METH0D(do_nand2) ;  //  register  do_nand2  with  kernel 

sensitive  <<  A  «  B;  //  sensitivity  list 

} 

}; 

3.2  Next  came  ...  AND 

All  additional  logic  gates  required  to  build  a  digital  computer  can  be  derived  from  NAND 
gates.  An  AND  gate  in  SystemC  consists  of  two  NAND  gates  wired  together.  The  AND 
gate  handles  two  boolean  inputs  A  and  B,  uses  one  internal  boolean  signal  SI,  and  produces 
a  boolean  output  F.  The  SC_CTOR  sets  up  the  digital  layout  of  the  logic  device  by  feeding 
A  and  B  into  the  first  NAND  gate,  nl.  The  output  of  nl,  SI,  is  then  fed  as  both  the  A  and 
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B  inputs  to  the  second  NAND  gate,  n2.  This  procedure  yields  a  final  output  F  for  the  AND 
gate.  Figure  3.2  illustrates  the  composition  of  the  AND  gate: 


The  AND  gate 

Figure  3.2:  The  AND  gate  utilizes  two  NAND  gates  to  produce  its  output. 
Here  is  the  SystemC  code  for  an  AND  gate: 

SC_M0DULE(_and2) 

{ 

sc_in<bool>  A,  B; 
sc_out<bool>  F; 

nand2  nl,  n2; 

sc_signal<bool>  SI; 

SC_CT0R(_and2)  :  nl("Nl"),  n2("N2") 

{ 

nl.A(A) ; 
nl.B(B) ; 
nl.F(Sl) ; 

n2.A(Sl) ; 
n2.B(Sl) ; 
n2.F(F); 

} 

}; 
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3.3  Many  more  logic  gates  are  now  possible 

We  construct  an  OR  gate  in  SystemC  by  utilizing  three  NAND  gates,  nl,  n2,  and  n3.  The 
logical  design  created  utilizing  the  SC_CTOR  accepts  two  boolean  inputs,  A  and  B,  gen¬ 
erates  two  internal  signals,  SI  and  S2,  and  produces  a  final  boolean  output,  F.  The  first 
NAND  gate  nl  uses  A  for  both  its  inputs  and  outputs  SI.  The  second  NAND  gate  n2  uses 
B  for  both  its  inputs  and  produces  signal  S2.  The  third  NAND  gate  n3  accepts  SI  and  S2  as 
inputs,  yielding  F  as  the  final  output.  Figure  3.3  illustrates  the  composition  of  the  OR  gate. 


The  OR  gate 

Figure  3.3:  The  OR  gate  employs  three  NAND  gates  to  produce  output  F. 
Here  is  the  SystemC  code  for  the  OR  gate: 

SC_M0DULE(_or2) 

{ 

sc_in<bool>  A,  B; 
sc_out<bool>  F; 

nand2  nl,  n2,  n3; 

sc_signal<bool>  SI,  S2; 

SC_CT0R(_or2)  :  nl("Nl"),  n2("N2"),  n3("N3") 

{ 


nl.A(A) ; 
nl.B(A) ; 
nl.F(Sl) ; 
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n2.A(B) ; 
n2.B(B) ; 
n2.F(S2) ; 

n3.A(Sl) ; 
n3.B(S2) ; 
n3.F(F); 

} 

}; 

3.4  Have  you  ever  dealt  with  a  NOT? 

Constructing  a  NOT  gate  in  SystemC  merely  requires  one  internal  NAND  gate,  n.  IN 
becomes  both  inputs  to  the  NAND  gate  n.  The  output  from  n  yields  the  final  output  F. 
Figure  3.4  illustrates  the  composition  of  the  NOT  gate. 


Figure  3.4:  The  NOT  gate  merely  requires  one  NAND  gate. 
Here  is  the  SystemC  code  for  the  NOT  gate: 

SC_MODULE(_notl) 

{ 

sc_in<bool>  IN; 
sc_out<bool>  OUT; 

nand2  n; 

SC_CTOR(_notl)  :  n("N") 
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{ 

n.A(IN) ; 
n.B(IN) ; 
n.F(OUT) ; 

} 

}; 

3.5  The  XOR  gate 

Creating  an  XOR  gate  in  SystemC  requires  four  NAND  gates,  nl,  n2,  n3,  and  n4.  We 
create  the  logical  circuit  by  connecting  two  boolean  inputs  (A  and  B)  to  the  NAND  gates 
as  shown  in  Figure  3.5.  In  addition  to  the  two  boolean  inputs,  the  XOR  circuit  uses  three 
internal  boolean  signals  and  produces  the  final  boolean  output  (F). 


Figure  3.5:  The  XOR  gate  requires  four  NAND  gates. 

3.6  The  Multiplexor  (a.k.a  "Mux") 

To  build  a  multiplexor  (aka  Mux)  in  SystemC,  we  utilize  the  logic  gates  described  above. 
The  Mux  requires  two  AND  gates,  one  NOT  gate,  one  OR  gate,  and  three  input  booleans 
(A,  B,  and  SEL).  It  uses  three  internal  signals,  SI,  S2,  and  NOTSEL,  to  produce  a  final 
output  E.  Figure  3.6  illustrates  the  composition  of  a  Mux. 
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SEL 


Figure  3.6:  The  Mux  consists  of  two  AND  gates,  one  NOT  gate,  and  one  OR  gate 


3.7  D-Mux  that,  please 

A  demultiplexer,  or  D-Mux,  performs  the  inverse  funetion  of  the  multiplexor.  Rather  than 
seleeting  between  two  inputs,  the  D-Mux  "is  an  output  selector  whieh  has  a  single  input 
and  directs  it  to  one  of  N  outputs"  [13]. 

Constructing  a  D-Mux  requires  the  use  of  two  AND  gates,  al  and  a2,  and  one  NOT  gate, 
n.  The  SC_CTOR  wires  the  logic  design  as  follows:  external  boolean  input  IN  is  wired  as 
one  of  the  inputs  required  for  both  al  and  a2.  An  additional  external  value,  SEL  is  wired 
directly  to  AND  gate  a2  as  its  second  required  input.  SEL  is  also  run  through  the  NOT  gate 
n  where  its  result,  NOTSEL,  is  used  as  the  second  input  for  AND  gate  al.  The  output  of  al 
is  A.  The  output  of  a2  is  B.  Depending  on  the  value  of  the  SEL  bit,  the  initial  input  value 
IN  will  be  passed  through  as  output  A  or  output  B.  Eigure  3.7  illustrates  the  composition  of 
a  demultiplexer. 


Figure  3.7:  The  D-Mux  consist  of  two  AND  gates  and  one  NOT  gate. 
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3.8  Handling  larger  input  arrays 

Each  of  the  previously  described  elementary  logic  gates  in  our  digital  design  handles  single¬ 
bit  inputs,  but  for  the  machine  we  are  building,  16-bit  buses  must  be  dealt  with.  By  com¬ 
bining  multiple  single-bit  gates,  the  16-bit  input  buses  can  be  processed.  In  the  case  of 
processing  a  16-bit  value  through  the  NOT  gates,  we  combine  sixteen  NOT  gates  to  form  a 
NOT_16  gate.  The  composition  of  the  NOT_16  circuit  is  illustrated  in  Figure  3.8. 
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IN[4] 
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F[13] 

F[14] 

F[15J 


The  "NOT_16”  gate 


Figure  3.8:  Sixteen  NOT  gates  are  placed  together  to  handle  a  16-bit  input  bus. 


A  simpler  visual  display  is  provided  in  Figure  3.9.  In  this  diagram,  the  sixteen  bits  are  all 
fed  into  the  NOT_16  gate  simultaneously,  producing  a  16-bit  output,  F/16. 


Figure  3.9:  The  NOT_16  gate  handles  a  16-bit  bus  input,  negates  the  value  of  each  simultane¬ 
ously,  and  yields  output  F/16. 
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To  complete  a  fully  functioning  simple  modern  computer,  we  continued  to  build  additional 
logic  chips  as  specified  by  Nisan  and  Schocken.  The  computer  they  designed  is  called 
the  HACK  machine,  and  we  largely  follow  their  designed  machine — only  creating  it  in 
SystemC  instead  of  HDL.  Just  as  we  did  with  the  NOT_16  gate,  we  combine  other  smaller 
logic  gates  to  form  logic  circuits  that  can  handle  16-bit  bus  inputs  and  produce  16-bit  bus 
outputs.  This  effort  quickly  produced  the  following  three  logic  gates:  the  AND_16  gate, 
the  OR_16  gate,  and  aMux_16  gate.  They  are  illustrated  in  Figures  3.10,  3.11  and  3.12. 


A/16 

B/16 


F/16 


Figure  3.10:  The  AND_16  gate  handles  16-bit  bus  inputs,  "AND-ing"  each  of  the  input  bits 
simultaneously 


A/16 

B/16 


F/16 


Figure  3.11:  The  OR_16  gate  handles  16-bit  bus  inputs  simultaneously  and  produces  the  output 
F/16. 


SEL 


F/16 


Figure  3.12:  The  Mux_16  gate  handles  two  16-bit  bus  inputs  simultaneously  to  select  a  final 
output  of  F/16. 


An  additional  logic  circuit  named  the  OR_8  gate  utilizes  seven  regular  OR  gates  to  select 
between  eight  boolean  inputs.  A,  B,  C,  D,  E,  F,  G,  and  H.  The  internal  OR  gates  are  orl, 
or2,  orS,  and  or4.  Their  outputs  are  wired  to  or5  and  or6.  Finally,  the  outputs  of  or5  and 
or6  are  input  into  orV,  which  yields  the  final  output  F.  The  logical  implication  of  this  circuit 
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is  as  follows:  if  any  of  the  inputs  is  true,  the  output  of  the  OR_8  will  be  true.  The  only  time 
the  OR_8  gate  will  produce  a  false  or  zero  output  is  when  all  input  booleans  are  also  false. 
Figure  3.13  illustrates  the  internal  composition  of  the  OR_8  gate. 


Figure  3.13:  The  OR_8  gate  produces  a  "true"  bit  output  if  any  of  the  eight  input  bits  are  true. 
If  all  eight  boolean  inputs  are  "false"  or  zero  values,  the  output  of  the  OR_8  will  be  zero. 


3.9  Continuing  the  construction  of  larger  logic  gates 

The  next  logic  gate  we  designed  in  SystemC  was  the  Mux4wayl6  gate.  The  Mux4wayl6 
logic  gate  performs  the  function  of  selecting  between  four  16-bit  bus  inputs  -  A/16,  B/16, 
C/16,  and  D/16.  Also  required  in  this  effort  are  two  select  bits,  SELO  and  SELL  These 
select  bits  allow  a  selection  to  be  made  between  the  four  options,  which  yields  the  final 
output  E_16.  Internally,  two  additional  16-bit  buses,  S/16  and  T/16,  are  required.  Figure 
3.14  illustrates  the  internal  composition  of  the  Mux4wayl6  gate. 

Progressing  sequentially  in  our  design,  the  next  logical  step  was  creating  a  Mux8wayl6 
gate.  Just  as  the  name  suggests,  the  Mux8wayl6  gate  selects  a  final  output  bus  from  eight 
16-bit  input  buses.  The  construction  of  the  Mux8wayl6  in  SystemC  requires  eight  input 
buses,  named  A/i(5,  B/16,  C/16,  D/16,  E/16,  E/16,  G/16,  and  H/16,  and  three  select  bits 
{SELO,  SELL  and  SEL2).  Two  Mux4wayl6  gates,  mO  and  ml,  and  one  Mux_16  gate,  m2, 
are  utilized  internally  to  produce  the  desired  outcome.  The  composition  of  the  Mux8wayl6 
is  illustrated  in  Figure  3.15. 
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SELO 


SELl 


Figure  3.14:  The  Mux4wayl6  selects  between  four  16-bit  input  buses  and  produces  one  16-bit 
output  bus. 


3.10  Once  you  Mux,  it’s  easy  to  D-Mux 

Just  as  we  expanded  the  multiplexors  to  handle  more  inputs,  demultiplexors  ean  also  be 
expanded  to  handle  more  outputs.  A  demultiplexer  performs  the  inverse  funetion  of  the 
multiplexor,  taking  a  single  input,  IN,  and  direeting  it  to  one  of  N  outputs.  Figure  3.16 
demonstrates  a  4-bit  D-Mux.  Our  SystemC  implementation  utilizes  three  D-Mux  gates,  dl, 
d2,  and  d3,  and  two  seleet  bits,  SELO  and  SELl,  to  eonstruet  the  D-Mux4way.  Two  internal 
booleans,  S  and  T,  are  also  used.  The  four  final  outputs  are  labeled  A,  B,  C  and  D.  The 
eomposition  of  the  D-Mux4way  is  illustrated  in  Figure  3.16. 

To  extend  the  eapabilities  of  the  demultiplexer  to  handle  eight  outputs,  we  ereated  the 
DmuxSway  gate.  The  DmuxSway  eonsists  of  one  D-Mux2way  gate,  dl,  and  two  D- 
Mux4way  gates,  d2  and  d3.  Three  input  seleetors,  SELO,  SELl  and  SEL2,  are  also  required. 
The  eight  outputs  are  labeled  A,  B,  C,  D,  E,  E,  G,  and  H.  Figure  3.17  shows  the  logieal 
eonstruetion  of  an  8-way  demultiplexer. 
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A/16 

B/16 

C/16 

D/16 

E/16 

F/16 

G/16 

H/16 


0/16 


Figure  3.15:  The  Mux8wayl6  selects  its  final  outcome  choice  0/16  from  among  eight  16-bit 
buses. 


SELl  SELO 


Figure  3.16:  The  D-Mux4way  directs  an  incoming  bit  to  one  of  four  outputs. 
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Figure  3.17:  The  D-Mux8way  directs  an  incoming  bit  to  one  of  eight  outputs. 


3.11  Time  for  some  simple  addition:  Introducing  the 
Half- Adder  and  Full- Adder 

As  we  move  forward  with  the  construction  of  the  HACK  machine  using  SystemC,  per¬ 
forming  arithmetic  operations  is  necessary.  For  the  simplest  arithmetic,  addition,  we  are 
able  to  implement  this  capability  in  two  steps:  constructing  a  Half  Adder,  followed  by  the 
construction  of  a  FullAdder. 


To  make  the  HalfAdder  in  SystemC,  two  boolean  inputs,  A  and  B,  are  required,  and  the 
operation  yields  two  boolean  outputs,  SUM  and  CARRY.  Internal  composition  of  the  Half- 
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Adder  gate  requires  one  XOR  gate  and  on&  AND  gate.  Their  composition  is  shown  in  Figure 
3.18. 


Now  that  the  HalfAdder  has  been  constructed,  assembling  the  FullAdder  becomes  possible. 
The  FullAdder  is  built  by  assembling  together  two  HalfAdders  {hal  and  ha2)  and  an  OR 
gate  (or)  and  then  routing  in  three  boolean  inputs  (A,  B  and  CIN),  utilizing  three  internal 
boolean  signals  (5,  T,  and  U),  and  ultimately  producing  two  boolean  outputs  -  SUM  and 
COUT.  The  diagram  in  Figure  3.19  illustrates  the  assembly  of  the  FullAdder. 


Figure  3.19:  The  FullAdder 


3.12  Preforming  addition  on  input  buses 

Now  equipped  with  the  ability  to  preform  addition,  our  next  goal  consisted  of  performing 
addition  on  16-bit  values.  Much  like  we  did  with  previous  implementations  of  logic  chips 
for  larger  buses,  the  construction  of  t\\Q  ADD_16  gate  contains  sixteen  FullAdders  in  order 
to  process  the  operands. 

The  parts  of  the  ADD_16  gate  are  arranged  as  follows  using  SystemC:  two  16-bit  input 
buses,  A/16  and  B/16,  the  operands  to  be  added  together.  Each  of  the  individual  FullAdder 
gates  -  faO,  fal ,  fa2,  fa3,  fa4,  faS,  fa6,  fa7,  faS,  fa9,  falO,  fal  1 ,  fall,  fal3,  fal4,  and  fal5 
-  generates  an  internal  carry-out  signal  (c-out[n]),  which  is  passed  to  the  next  sequential 
FullAdder  gate.  The  last  carry-out  bit  is  discarded.  The  SUM  is  output  from  each  internal 
FullAdder  then  placed  on  a  bus.  Figure  3.20  illustrates  the  composition  of  the  ADD_16 
circuit. 

Continuing  our  efforts  to  enhance  the  HACK  machine’s  ability  to  perform  arithmetic  op¬ 
erations,  implementing  an  incrementor  was  the  next  logical  step.  Our  incrementor,  named 


21 


(SUM)  F[0] 
(SUM)  F[l] 
(SUM)  F[2] 
(SUM)  F[3] 
(SUM)  F[4] 
(SUM)  F[5] 
(SUM)  F[6] 
(SUM)  F[7] 
(SUM)  F[8] 
(SUM)  F[9] 
(SUM)  F[10] 
(SUM)  F[ll] 
(SUM)  F[12] 
(SUM)  F[13] 
(SUM)  F[14] 
(SUM)  F[15] 


Figure  3.20:  The  ADD_16  gate  adds  two  16-bit  values  together. 


INC_16,  is  designed  to  handle  an  arbitrary  16-bit  boolean  value  and  increase  (or  incre¬ 
ment)  it  by  one.  Designing  the  INC_16  gate  in  SystemC  requires  one  ADD_16  gate.  An 
abstraction  of  the  implementation  of  INC_16  is  shown  in  Figure  3.21. 

Another  logic  gate  needed  for  the  HACK  machine  is  the  Controlled_Zerol6.  The  Con- 
trolled_Zerol6  circuit  accepts  any  16-bit  input,  IN/16,  and  proceeds,  when  instructed  by 
the  select  bit  C,  to  zero  all  bits  of  the  bus,  producing  an  output  bus,  OUT/16,  of  all  zeros. 
If  the  C  select  bit  is  not  asserted,  the  inputted  value  will  be  outputted  as  OUT/16  without 
any  alteration  to  the  value.  To  build  the  Controlled_Zerol6  circuit  in  SystemC,  owqANDW 
gate,  one  Muxl6  gate,  and  one  NOT/16  are  utilized.  Two  internal  boolean  buses,  S/16  and 
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IN/16 


ONE/16 


ADD_16 


OUT/16 


Figure  3.21:  The  INC_16  circuit  increments  an  inputted  value,  IN,  by  one. 


T/16,  are  also  used.  The  logieal  layout  of  the  Controlled_Zerol6  eircuit  is  illustrated  in 
Figure  3.22. 


(SEE)  C 


IN/16 


OUT/16 


Figure  3.22:  The  Controlled_Zerol6  gate  can  either  zero  out  the  entire  inputted  value  or  output 
the  original  input  unaltered. 


The  next  digital  logie  eireuit  we  designed  in  SystemC  is  the  Controlled_Not  gate.  Very 
similar  to  the  Controlled_Zerol6  eireuit,  the  Controlled_Not  gate  negates  (or  "flips")  eaeh 
of  the  16-bit  values  it  reeeives.  The  Contwlled_Not  eireuit  is  eonstrueted  with  one  Muxl6 
and  one  Not  16  gate.  The  16-bit  input  bus  IN  is  splieed  in  two  direetions  -  one  running 
through  the  NOT16  gate,  nl,  before  being  fed  into  the  Muxl6,  and  the  other  being  fed 
direetly  into  the  Mux  16.  The  seleet  bit  C  ehoses  whieh  value  to  output  as  OUT/16,  either 
the  original  inputted  value  or  its  negated/eomplemented  value.  The  eonstruetion  of  the 
Controlled_Notl6  logieal  eireuit  is  illustrated  in  Figure  3.23. 
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(SEL)  C 


Figure  3.23:  The  Controlled_Notl6  will  either  flip  each  bit  of  the  input  bus  or  output  the 
inputted  value  unaltered. 


3.13  Two  more  arithmetic  circuits  must  be  constructed 
prior  to  the  ALU 

As  we  near  the  ability  to  construct  an  ALU  for  our  HACK  machine,  only  two  more  arith¬ 
metic  circuits  are  needed.  The  first  of  these  two  is  called  the  AND_or_ADD16  circuit.  The 
AND_or_ADD16  circuit  requires  owqANDW  gate,  one  ADDl 6  gate,  and  one  Muxl 6  gate. 
Inputs,  A/16  and  B/16  are  wired  to  both  the  AND16  gate  and  the  ADD16  gate.  The  out¬ 
puts  from  the  AND  16  and  the  ADD  16  are  then  inputted  into  the  Mux  16.  The  select  bit  C 
determines  which  of  the  two  inputted  values  is  selected  for  the  output  OUT/ 16.  If  the  C  bit 
is  zero  ("0"),  the  output  of  the  ANDI6  gate  is  passed  through  as  the  output.  Otherwise,  if 
the  C  is  a  one  ("I"),  the  output  of  the  ADDI6  is  passed  through  as  the  final  result,  OUT/16. 
Figure  3.24  illustrates  the  logical  arrangement  of  the  AND_or_ADD16  circuit. 

(SEL)  C 

OUT/16 

Figure  3.24:  The  AND_or_ADD16  circuit 
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The  last  logic  circuit  required  before  implementing  the  ALU  is  the  check_16.  The  check_16 
produces  a  16-bit  boolean  output  bus,  as  well  as  two  single-bit  boolean  output  signals,  ZR 
and  NG.  To  construct  the  check_16  circuit  in  SystemC,  fifteen  OR  gates,  sixteen  AAD  gates, 
and  one  NOT  gate  are  used  (alternatively,  two  OrSway  gates,  one  two-way  OR  gate  and  one 
NOT  gate  can  also  be  used).  The  internal  wiring  is  illustrated  in  Figure  3.25. 


IN[0] 

IN[1] 

IN[2] 

IN[3] 

IN[4] 

IN[5] 

IN[6] 

IN[7] 

IN[8] 

IN[9] 

IN[10] 

IN[11] 

IN[12] 

IN[13] 

IN[14] 

IN[15] 


Figure  3.25:  The  Check_16  circuit 
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OUT[l] 
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OUT[6] 

OUT[7] 

OUT[8] 

OUT[9] 

OUT[10] 

OUT[ll] 

OUT[12] 

OUT[13] 

OUT[14] 

OUT[15] 

ZR 

NG 
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3.14  Time  to  build  the  ALU 


We  now  have  all  digital  logic  tools  required  to  build  the  HACK’S  ALU  (Arithmetic  Logic 
Unit).  While  "The  centerpiece  of  the  computer’s  architecture  is  the  CPU...  the  centerpiece 
of  the  CPU  is  the  ALU,  or  Arithmetic-Logic  Unit"  [12]. 

The  ALU  executes  all  the  arithmetic  and  logical  operations  performed  by  a  computer.  De¬ 
pending  on  what  operations  a  computer  designer  wants  his/her  machine  to  calculate,  the 
exact  operations  of  the  ALU  may  vary  from  one  computer  design  to  another.  In  our  case, 
we  are  designing  the  HACK  machine  as  specified  by  Nisan  and  Schocken.  Nisan  and 
Schocken  state,  "The  Hack  ALU  computes  a  fixed  set  of  functions  out  =fi(x,  y)  where  x 
and  y  are  the  chip’s  two  16-bit  inputs,  out  is  the  chip’s  16-bit  output,  andyi  is  an  arithmetic 
or  logical  function  selected  from  a  fixed  repertoire  of  eighteen  possible  functions.  We  in¬ 
struct  the  ALU  which  function  to  compute  by  setting  six  input  bits,  called  control  bits,  to 
selected  binary  values"  [12]. 
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To  construct  the  ALU  in  SystemC,  two  boolean  input  buses  X/16  and  Y/16  as  well  as  six 
input  signals  (ZX,  ZY,  NY,  NY,  F,  and  NO)  were  used  to  produce  the  output  bus  OUT/16 
and  two  output  signals  ZR  and  NG.  The  digital  logic  circuits  utilized  included  two  Con- 
troled_zerol6s  (named  czl  and  cz2),  three  Controlled_notl6  chips  (named  cnl,  cn2  and 
cn3),  an  AndORadd  16  unit  (referred  to  as  aa)  and  a  checkl6  circuit  (simply  named  check). 
Figure  3.26  provides  a  visual  representation  of  the  ALU’s  logical  construction. 


3.15  Constructing  more  hardware:  a  single-bit  and  16-bit 
register 

Next,  we  designed  a  single  bit  register  and  then  built  a  larger,  16-bit  register.  To  construct 
the  single-bit  register  in  SystemC,  a  single  boolean  input  IN  was  fed  in,  and  boolean  input 
signals  LOAD  and  CLOCK  were  also  fed  in.  The  MUX  logic  gate,  as  well  as  a  digital 
flip-flop  (commonly  referred  to  as  a  "dff")  were  also  used.  Two  internal  signals  S  and  T 
were  used,  and  the  output  signal  was  OUT.  See  Figure  3.28  for  a  visual  representation  of 
the  single-bit  register  compilation. 


LOAD  CLOCK 


Figure  3.27:  The  internal  composition  of  a  single-bit  register. 

With  the  single-bit  register  constructed,  we  were  able  to  combine  sixteen  together.  The 
16-bit  register  (also  simply  referred  to  as  register)  consists  of  16  single-bit  registers  {bO- 
bl5),  a  16-bit  input  bus,  a  LOAD  bit,  and  a  CLOCK  bit.  Its  full  logical  layout  is  shown  in 
Figure  3.28. 
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LOAD  CLOCK 


IN[0] 
IN[1] 
IN[2] 
IN[3] 
IN[4] 
IN[5] 
IN[6] 
IN[7] 
IN[8] 
IN[9] 
IN[1()] 
IN[11] 
IN[12] 
IN[13] 
IN[14] 
INI  1 5] 


-  bit-register,  bO  - 

-  bit-register,  b  1  - 

-  bit-register,  b2  - 

-  bit-register,  b3  - 

-  bit-register,  b4  - 

-  bit-register,  b5  - 

-  bit-register,  b6  - 

-  bit-register,  b7  - 

-  bit-register,  b8  - 

-  bit-register,  b9  - 

-  bit-register,  b  10  - 

-  bit-register,  b  1 1  - 

-  bit-register,  bl 2  - 

-  bit-register,  bl 3  - 

-  bit-register,  b  14  - 

-  bit-register,  b  1 5  - 

OUT[0] 

OUT[l] 

OUT[2] 

OUT[3] 

OUT[4] 

OUT[5] 

OUT[6] 

OUT[7] 

OUT[8] 

OUT[9] 

OUT[10] 

OUT[ll] 

OUT[12] 

OUT[13] 

OUT[14] 

OUT[15] 


Figure  3.28:  A  16-bit  register  is  capable  of  holding  the  16-bit  input  values  HACK  uses  to  operate. 


3.16  Let’s  store  some  memory 

Having  constructed  the  16-bit  register,  we  now  can  construct  the  random  access  memory 
(RAM).  The  RAMS  unit  allows  us  to  address  eight  words  of  memory.  Once  we  have  chosen 
what  address  we  want,  we  can  either  read  information  from  the  location  or  write  informa¬ 
tion  to  the  location.  We  use  recursive  ascent  to  build  larger  memories.  The  following 
demonstrate  the  recursive  ascent  approach. 

To  construct  the  RAMS,  eight  16-bit  registers,  a  D-mux_8,  and  a  Mux8wayl6  were  utilized. 
Additionally,  three  boolean  signals  {LOAD,  CLOCK,  and  ADDR)  were  needed.  The  circuit 
is  shown  in  Figure  3.29. 

To  build  the  RAM64,  eight  RAM8"s,  a  D-mux_8,  and  a  Mux8wayl6  are  utilized.  A  boolean 
input  array  IN/16  is  routed  into  the  RAM64  circuit  together  with  a  LOAD  and  CLOCK  bit. 
The  circuit  is  shown  in  Figure  3.30. 


28 


A.D 


/16 


OUT/16 


Figure  3.29:  RAMS 


OAD 


N/16 


ADDR5 

ADDR4| 

ADDR3I 


ADDRs  0,  1 ,  &  2  -  run  to  each  RAMS  circuit  (r0-r7) 


OUT/ 16 


Figure  3.30:  RAM64 


3.17  Making  more  and  more  memory 


With  the  RAM64  now  implemented,  the  next  step  was  to  continue  to  increase  memory 
size.  We  did  so,  building  the  RAMS  12  and  ultimately  the  RAM4K.  While  building  the 
RAM4K  we  found  the  memory  resources  of  our  own  computer  to  be  extremely  strained 
while  running  the  simulation,  ultimately  being  forced  to  use  a  machine  with  16-GBytes  of 
RAM  (the  simulation  of  the  RAM4K  used  9-GBytes).  Figures  3.31  and  3.32  illustrate  the 
RAMS  12  and  RAM4K.  Because  the  memory  requirement  became  so  intense,  we  elected 
to  stop  constructing  larger  memory  sizes. 
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LOAD 


IN/ 16 


ADDRll 
ADDRIQ 
ADDR9I  1 


ADDRs  0-8  directly  run  to  each  RAM_512  circuit 
CLOCK 


OUT/16 


Figure  3.32:  RAM4K 


3.18  The  Program  Counter 

Now  that  we  have  constructed  memory,  our  next  task  as  we  work  towards  building  the 
HACK  machine  is  to  create  a  program  counter.  The  program  counter  accepts  as  input  a 
boolean  bus  IN/16  and  boolean  signals  LOAD,  INC,  RESET,  and  CLOCK  and  produces  a 
16-bit  boolean  output  OUT/16.  The  program  counter  utilizes  a  muxl6,  controlled_zerol6, 
one  16-bit  register,  and  a  INC_16.  There  are  four  internal  boolean  buses  and  two  internal 
boolean  signals  inside  the  program  counter.  The  logic  design  of  the  program  counter  is 
shown  in  Figure  3.33. 


3.19  Now  it’s  time  to  jump 

An  integral  part  of  any  CPU  is  having  the  ability  to  jump  while  running  a  program.  Thus, 
we  next  built  the  JumpDetermination  circuit.  There  are  three  bits  in  each  16-bit  instruction 
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LOAD 


RESET 


CLOCK 


INC 


IN/ 16 


OUT/16 


Figure  3.33:  The  "Program  Counter" 


that  specify  whether  or  not  to  jump.  These  three  bits  are  named  Jlbit,  J2bit,  and  the  J3bit. 
You  must  also  know  if  the  instruction  is  a  C -instruction;  thus,  we  have  the  isCinstruction 
bit.  The  NG  bit  (informing  us  if  the  number  is  negative)  and  ZR  bit  are  also  needed. 

The  JumpDetermination  requires  a  total  of  25  AND  gates,  seven  NOT  gates,  and  seven  OR 
gates.  The  logic  design  is  illustrated  in  Figure  3.34. 


3.20  Finally,  the  CPU 

We  are  finally  ready  to  build  the  central  processing  unit,  or  CPU.  The  HACK  computer,  a 
von  Newmann  machine,  stores  data  and  instructions  in  memory.  The  machine  language  is 
called  the  HACK  machine  language.  A  comprehensive  explanation  of  the  HACK  machine 
language  is  provided  in  Chapter  4  of  The  Elements  of  Computing  Systems  [12].  If  the  MSB 
is  set  to  zero  (0),  the  bits  are  interpreted  as  an  address  or  data.  If  the  MSB  is  set  to  one,  the 
bits  are  interpreted  as  an  instruction  [12]. 

Instructions  allow  the  CPU  to  know  what  operations  the  user  wants  performed  -  logical, 
arithmetic,  etc.  There  are  four  main  fields  of  HACK  instructions  (known  as  C -instructions). 
Figure  3.35  shows  the  fields.  Seven  bits  (a,  cl,  c2,  c3,  c4,  c5,  and  c6)  instruct  the  ALU 
what  operation  to  preform.  The  destination  bits  (dl,  d2,  and  d3)  inform  the  CPU  where  to 
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Figure  3.34:  JumpDetermination 


send  the  ALU’s  output,  and  the  jump  bits  (jl,j2,  and  j3)  determine  whether  a  "jump"  is 
needed. 


isC 


comp 


dest  jump 


binary: 


1 

X 

X 

a 

cl 

c2 

c3 

c4 

c5 

c6 

dl 

d2 

d3 

jl 

j2 

j3 

Figure  3.35:  The  C-lnstruction's  four  fields. 


The  CPU  evaluates  the  MSB,  checking  to  see  wether  the  bits  represent  a  C-instruction  or 
A-instruction.  The  CP\]  fetches  (i.e.,  reads)  a  word  from  the  instruction  memory,  decodes 
it,  and  executes  the  specified  instruction  [12]. 
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Two  figures  illustrate  the  funetion  of  the  CPU  and  its  internal  eomponents.  Figure  3.36 
shows  the  inputs  and  outputs  of  the  CPU  at  a  high  level  of  abstraction.  Figure  3.37  shows 
the  internal  components  of  the  CPU  and  the  logical  connections  needed. 


outM/16 
writeM/1 
addressM/15 
pc/ 15 


Figure  3.36:  A  high-level  diagram  of  the  CPU  showing  both  inputs  and  outputs. 


instruction/ 16 
inM 

reset 


Figure  3.37:  A  low-level  diagram  of  the  CPU  shows  the  required  internal  logical  circuits.  Each 
circled  c  refers  to  control  logic. 
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3.21  And  now  the  computer 

To  accommodate  SystemC  constraints,  we  diverted  from  Nisan  and  Schocken  in  the  follow¬ 
ing  respects.  First,  the  size  of  the  memory  (both  instruction  memory  and  data  memory)  was 
limited  to  64  addresses  (aka  RAM64).  When  simulating  HACK  in  SystemC  with  memory 
sizes  larger  than  RAM64,  simulation  required  much  more  than  4-Gbytes.  However,  the 
RAM64  size  was  sufficient  to  demonstrate  small  programs.  The  second  deviation  from  the 
Nand2Tetris  model  was  the  lack  of  a  pre-built  ROM  (Read-Only  Memory).  To  address  this 
problem,  we  used  a  RAM64  logic  circuit  as  instruction  memory. 

The  HACK  computer  has  three  main  components:  an  instruction-memory,  the  CPU,  and 
data-memory.  The  first  program  we  tested  on  the  HACK  computer  was  an  addition  pro¬ 
gram.  This  simple  program  adds  the  numbers  two  and  three  together.  If  all  logic  gates  have 
been  assembled  and  wired  together  correctly,  the  output  will  be  five.  This  program  requires 
six  instructions;  they  are  the  following: 

//@2 

0000000000000010 

//D=A 

1110110000010000 

//@3 

0000000000000011 

//D=D+A 

1110000010010000 

//so 

0000000000000000 

//M=D 

1110001100001000 

To  make  this  program  work,  we  assemble  the  computer  as  depicted  in  Figure  3.38,  then  to 
load  the  program  into  instruction  memory,  we  load  each  instruction  one-by-one,  making 
sure  that  the  SET  bit  is  enabled  and  the  GET  bit  disabled  during  the  loading  process.  Next 
we  run  the  program  for  a  predetermined,  fixed  number  of  clock  cycles,  with  SET  and  GET 
disabled.  Einally,  we  read  the  contents  of  the  data  memory  at  address  zero.  This  requires 
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one  additional  clock  cycle  with  the  GET  bit  enabled  and  the  SET  bit  disabled.  At  the 
completion  of  the  simulation,  the  contents  inside  address  zero  equal  five,  exactly  what  we 
expected. 


SET  get 


RESET 


Figure  3.38:  The  FIACK  computer  consists  of  three  main  parts:  instruction  memory,  the  CPU, 
and  data  memory. 
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CHAPTER  4: 

Experimental  Setup  and  Results 


Our  thesis  project  both  constructs  a  simple  modem  computer  utilizing  SystemC  and  facil¬ 
itates  the  design  and  integration  of  policy  enforcement  circuitry.  Having  constructed  the 
HACK  computer  described  in  the  previous  chapter  and  verified  that  it  can  correctly  run  a 
simple  program,  our  next  step  was  to  run  a  more  complex  program  and  verify  that  the  com¬ 
puter  yielded  a  correct  output.  We  chose  to  run  the  multiply  program  provided  by  Nisan 
and  Schocken  [12].  The  instructions  of  the  multiply  program  are  provided  below: 


//  This  is  the  multiply  program  -  it  is  a  simple  test  program  for  multiplying 
//  two  numbers.  We  will  initialize  the  instruction  memory  with  this  program. 
//  M[0]  should  contain  the  value  30  at  the  end  of  the  program. 

//  @5 

0000000000000101 
//  D=A 

1110110000010000 
//  @R1 

0000000000000001 
//  M=D 

1110001100001000 
//  @6 

0000000000000110 
//  D=A 

1110110000010000 
//  @R2 

0000000000000010 
//  M=D 

1110001100001000 
//  ORO 

0000000000000000 
//  M=0 
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1110101010001000 

//  @i 

0000000000010000 
//  M=0 

1110101010001000 
//  (LOOP) 

//  @R1 

0000000000000001 
//  D=M 

1111110000010000 

//  @i 

0000000000010000 
//  D=D-M 
1111010011010000 
//  SEND 

0000000000011010 
//  D;JLE 

1110001100000110 
//  @R2 

0000000000000010 
//  D=M 

1111110000010000 
//  ORO 

0000000000000000 
//  M=M+D 

1111000010001000 

//  @i 

0000000000010000 
//  M=M+1 

1111110111001000 
//  OLOOP 

0000000000001100 
//  0;JMP 

1110101010000111 
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//  (END) 

//  (SEND 

0000000000011010 
//  0;JMP 

1110101010000111 


The  multiply  program  consists  of  28  instructions;  thus,  the  program  length  is  set  at  28. 
Unlike  the  simple  addition  program  in  Chapter  3,  the  multiply  program  also  contained 
"looping"  via  conditional  jumps.  Because  of  this  feature,  we  had  to  increase  the  max  value 
to  allow  enough  cycle-iterations  to  complete  the  program.  We  chose  to  set  max  to  a  value 
of  128.  At  the  conclusion  of  the  128  clock  cycles,  the  value  of  M[0]  was  thirty,  exactly 
what  was  expected. 

With  the  outputs  of  both  programs  {addition  and  multiply)  yielding  correct  results,  our 
confidence  in  the  creation  of  the  HACK  machine  was  sufficient  to  move  to  the  second 
portion  of  our  project — the  design  of  policy  enforcement  circuitry.  To  demonstrate  built-in 
policy  enforcement  capabilities,  we  designed  an  integrity  checker. 


4.1  Creating  the  Integrity  Checker 

The  integrity  checker  protects  the  HACK  machine’s  data  memory  against  unauthorized 
modifications.  Our  simple  demonstration  assumes  that  the  first  four  lines  of  instruction 
code  come  from  an  "untrusted"  source;  thus,  we  want  to  prevent  the  instructions  from 
writing  to  unauthorized  memory  locations.  We  chose  to  allow  untrusted  code  to  write  to 
the  first  sixteen  data  memory  addresses,  but  to  deny  the  untrusted  code  from  writing  to  any 
other  addresses. 

We  constructed  the  integrity  checker  using  four  OR-gates,  five  AND-gates,  and  six  NOT- 
gates.  The  program  counter’s  second,  third,  fourth,  and  fifth  bits  are  fed  in  to  check  if  the 
instruction  resides  in  the  secure  or  insecure  portion  of  the  code.  If  the  integrity  checker 
(i.e.,  "checker")  detects  the  value  in  the  program  counter  to  be  less  than  four,  it  will  deny 
writes  to  data  addresses  greater  than  or  equal  to  sixteen.  If  any  of  the  values  of  pc2,  pc3, 
pc4,  or  pc5  is  one,  this  indicates  that  the  instruction  comes  from  the  trusted  portion  of  the 
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code;  thus,  the  program  is  allowed  to  write  to  any  location  in  data  memory.  Figure  4.1 
illustrates  the  logical  layout  of  the  integrity  checker  described  above. 


allow 


Figure  4.1:  The  Integrity  Checker. 


4.2  Incorporating  the  Integrity  Checker  in  the  HACK 
computer  design 

Integrating  the  integrity  checker  into  the  HACK  computer  design  is  straightforward,  merely 
affecting  the  "writeM"  bit,  which  proceeds  from  the  CPU  and  is  fed  into  mux2.  The  mux2 
gate  accommodates  the  GET  bit.  The  output  of  the  mux2  gate  is  then  "anded"  with  the 
output  of  the  checker.  If  the  writeM  bit  is  enabled  and  the  allow-hit  proceeding  from  the 
checker  is  also  enabled,  the  output  of  the  CPU,  outM,  will  be  written  into  data-memory  at 
the  specified  address.  Conversely,  if  the  integrity  checker  detects  that  an  instruction  in  the 
untrusted  portion  of  the  code  is  trying  to  write  to  a  prohibited  address,  the  allow-bit  will  be 
disabled,  and  writing  to  data-memory  will  be  denied. 
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SET  get 


RESET 


Figure  4.2:  The  FIACK  computer  with  the  "INTEGRITY  CHECKER"  installed. 

4.3  The  Test 

To  demonstrate  that  the  integrity  checker  correctly  prevents  writes  to  unauthorized  loca¬ 
tions,  we  created  and  ran  the  following  test  program: 

// (UNTRUSTED) 

//ORO 

0000000000000000 

//M=l 

1110111111001000 

//@16 

0000000000010000 

//M=l 

1110111111001000 

//(TRUSTED) 

//@7 
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0000000000000111 

//D=A 

1110110000010000 

//ORO 

0000000000000000 

//M=D 

1110001100001000 

//@16 

0000000000010000 

//M=D 

1110001100001000 

//(END) 

//SEND 

0000000000001010 
//O;  JMP 

1110101010000111 

As  previously  discussed,  the  first  four  instructions  are  considered  untrusted.  The  next  eight 
instructions  are  considered  trusted.  To  validate  the  functionality  of  the  checker,  we  first 
ran  the  test  code  in  the  HACK  machine  without  installing  the  integrity  checker.  After 
two  clock-cycles,  the  program  should  write  the  value  of  one  (1)  into  M[0]  (data-memory 
address  zero).  After  four  clock-cycles,  the  program  will  write  a  value  of  one  (1)  into  M[16]. 
After  eight  clock-cycles,  the  program  will  write  a  value  of  seven  (7)  into  M[0],  and  after 
ten  clock-cycles  the  value  of  seven  will  also  be  written  into  M[16].  Running  the  program 
yielded  the  expected  results,  as  shown  in  Table  4. 1 .  For  this  program,  we  set  max  to  twenty- 
four. 


Memory  Address 

Clock-cycles 

Data- Values 

M[0] 

2 

1 

M[16] 

4 

1 

M[0] 

8 

7 

M[16] 

10 

7 

Table  4.1:  Memory  values  after  running  the  "test"  program  in  the  HACK  machine  without 
installing  the  integrity  checker. 
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We  next  ran  the  "test"  program  on  the  HACK  computer  with  the  integrity  checker  installed. 
Untrusted  code  can  still  write  to  M[0- 15]  of  data- memory,  but  not  to  data- memory  locations 
greater  than  or  equal  to  sixteen.  As  shown  in  Table  4.2,  M[0]  has  a  value  of  one  after  two 
clock-cycles,  M[16]  remains  uninitialized  after  four  clock-cycles,  and  the  values  of  M[0] 
and  M[16]  will  be  set  to  seven  by  the  trusted  instructions  as  before. 


Memory  Address 

Clock-cycles 

Data- Values 

M[0] 

2 

1 

M[16] 

4 

uninitialized 

M[0] 

8 

7 

M[16] 

10 

7 

Table  4.2:  Memory  values  after  running  the  "test"  program  with  the  checker  installed. 


4.4  Impact  on  system  performance 

To  measure  the  impact  of  the  integrity  checker  on  system  performance,  we  run  the  test 
program  both  with  and  without  the  integrity  checker  installed.  We  measure  simulation 
time  using  the  UNIX  ’time’  command.  The  timing  results  are  listed  in  Tables  4.3  and  4.4, 
with  max  set  to  a  value  of  twenty-four.  Installing  the  integrity  checker  has  minimal  impact 
on  system  performance  (Averages  with  checker  installed  -  Real:  2.608,  User:  2.276,  and 
System:  0.285,  versus  averages  without  checker  installed  -  Real:  2.601,  User:  2.275,  and 
System:  0.284). 


Tests  run  without  security  mechanism 

Test  One 

Test  Two 

Test  Three 

Test  Four 

Real 

2.616 

2.603 

2.602 

2.581 

User 

2.279 

2.278 

2.263 

2.280 

System 

0.286 

0.277 

0.289 

0.283 

Table  4.3:  Timing  results  of  running  test  program  without  inclusion  of  the  integrity  checker. 


Tests  run  with  security  mechanism 

Test  One 

Test  Two 

Test  Three 

Test  Four 

Real 

2.608 

2.609 

2.620 

2.598 

User 

2.273 

2.275 

2.290 

2.269 

System 

0.289 

0.285 

0.286 

0.282 

Table  4.4:  Timing  results  of  running  test  program  with  the  inclusion  of  the  integrity  checker. 
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CHAPTER  5: 

Conclusion  and  Future  Work 


The  growth  in  system  complexity  enabled  by  Moore’s  Law  poses  major  challenges  for  de¬ 
signers  to  ensure  correct  operation,  security,  fault  tolerance,  and  other  key  properties.  Since 
electronic  design  automation  (EDA)  tools  (e.g.,  logic  synthesis)  first  appeared  decades  ago, 
the  complexity  of  both  hardware  and  software  has  risen  dramatically.  To  tackle  this  prob¬ 
lem,  high-level  synthesis  (HLS)  is  a  design  methodology  being  embraced  by  major  chip 
manufacturers,  including  Xilinx,  which  incorporates  HLS  into  its  Vivado  design  software 
suite.  Though  Vivado  does  not  use  SystemC,  it  uses  another  C-like  language  to  express 
both  hardware  and  software.  HLS  is  related  to  electronic  system-level  (ESL)  design,  which 
allows  a  chip  designer  to  express  an  algorithm  in  a  high-level  language,  and  the  design  tools 
automatically  translate  the  high-level  specification  into  a  cycle-accurate  system.  Since  de¬ 
signers  wishing  to  build  a  secure  system  must  be  able  to  fully  comprehend  the  system  and 
have  mastery  over  the  design  tools  that  practitioners  are  using,  HLS  has  the  potential  to 
facilitate  trustworthy  system  development  by  helping  designers  efficiently  express  hard¬ 
ware  and  software  components  for  computation,  communication,  security,  reliability,  etc. 
Also,  since  modeling  languages  like  SystemC  allow  designers  to  express  both  hardware 
and  software,  they  facilitate  co-simulation  of  hardware  and  software  in  the  same  simula¬ 
tion  environment,  which  offers  the  potential  of  greater  efficiency  than  traditional  methods. 

To  explore  the  application  of  HLS  to  trustworthy  system  development,  this  thesis  applied 
a  HLS  design  approach  to  a  simple  16-bit  computer,  implementing  it  in  the  popular  Sys¬ 
temC  modeling  language.  With  only  a  C-i-i-  compiler  and  the  SystemC  library,  we  were 
able  to  design  and  simulate  the  CPU,  instruction  memory,  and  data  memory.  Compiling 
the  C-F-i-  code  and  linking  with  the  SystemC  library  generates  an  executable,  and  running 
the  executable  performs  a  simulation  of  the  hardware.  While  simulating  the  computer’s 
hardware,  we  were  able  to  run  programs  in  the  machine  language  of  the  CPU  on  the  sim¬ 
ulated  hardware.  We  had  to  reduce  the  size  of  our  simulated  computer’s  memory  due  to 
the  well-known  problem  of  fine-grained  simulations  requiring  large  amounts  of  memory 
as  system  complexity  increases.  We  integrated  a  simple  memory  integrity  policy  enforce¬ 
ment  mechanism,  also  designed  in  SystemC,  into  our  design.  Our  simulation  results  show 
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that  the  mechanism  correctly  enforces  the  integrity  policy  and  imposes  minimal  impact  on 
overall  system  performance. 

We  see  many  opportunities  for  future  work.  For  example,  it  would  be  worthwhile  to  explore 
more  complex  processor  designs  and  more  sophisticated  security  policy  enforcement  mech¬ 
anisms.  It  would  also  be  useful  to  learn  more  about  the  capabilities  of  production-grade, 
tape-out  tools  like  Vivado  HLS  (e.g.,  modeling  a  system  with  Vivado  and  prototyping  it  on 
an  inexpensive  FPGA  board,  such  as  the  Basys  2  board  from  Digilent).  We  would  also  like 
to  overcome  the  technical  hurdles  to  more  fully  implement  the  16-bit  design,  including  a 
display,  keyboard,  and  larger  memory.  We  would  like  to  perform  more  optimizations  on 
our  16-bit  design  to  make  it  more  efficient  and  to  compare  its  implementation  in  SystemC 
against  the  same  design  expressed  in  traditional  HDL.  We  would  like  to  become  more  pro¬ 
ficient  in  SystemC  so  that  we  can  more  efficiently  and  elegantly  express  circuits.  We  would 
also  like  to  express  our  design  in  other  languages,  environments,  and  tool  flows.  Finally, 
the  broader  impact  of  our  work  aims  to  make  it  easier  for  Computer  Scientists  to  leverage 
custom  hardware’s  benefits. 
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